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1.  Introduction 

Given  data  x  obtained  under  a  parametric  model  indexed  by  finite-dimensional  8,  the  Bayesian 
learning  process  is  based  on 

p(S|I)  =  _w^£(a_,  u.i) 

/  l(6;x)pm  iB 

the  familiar  form  of  Bayes  theorem,  relating  the  posterior  distribution,  p(8  |x),  to  the  likelihood,  1(0,  x), 
and  the  prior  distribution,  p(0).  If  6  =  ($,  t/),  with  interest  centering  on  0,  the  joint  posterior  distribu¬ 
tion  is  marginalized  to  give  the  posterior  distribution  for  <p. 

Pit  I*)  =  /  Pit .  V\x)  *¥■  (1-2) 

If  summary  inferences  in  the  form  of  posterior  expectations  are  required — for  example,  posterior 
means  and  variances — these  are  based  on 

E[m(6)\x]  =  /  m(6)pi0\x)  d 0,  (1.3) 

for  suitable  choices  of  m(-). 

Thus,  in  the  continuous  case,  the  integration  operation  plays  a  fundamental  role  in  Bayesian 
statistics;  be  it  for  calculating  the  normalizing  constant  in  (1.1),  the  marginal  distribution  in  (1.2),  or 
the  expectation  in  (1.3).  However,  except  in  simple  cases,  explicit  evaluation  of  such  integrals  will 
rarely  be  possible  and  realistic  choices  of  likelihood  and  prior  will  necessitate  the  use  of  sophisticated 
numerical  integration  or  analytic  approximation  techniques  (see,  for  example.  Smith  et  al,  1985,  1987, 
Tierney  and  Kadane,  1986).  This  can  pose  problems  for  the  applied  practitioner  seeking  routine, 
easily  implemented,  procedures.  For  the  student,  who  may  already  be  puzzled  and  discomforted  by  the 
intrusion  of  too  much  calculus  into  what  ought  surely  to  be  a  simple,  intuitive,  statistical  learning 
process,  this  can  be  totally  off-putting. 

In  the  following  sections,  we  shall  address  this  problem  by  taking  a  new  look  at  Bayes  theorem 
from  a  sampling-resampling  perspective.  This  will  be  seen  to  open  the  way  both  to  easily  imple¬ 
mented  calculations  and  to  essentially  calculus-free  insight  into  the  mechanics  and  uses  of  Bayes 
theorem. 


2.  From  densities  to  samples 

As  a  first  step,  we  note  the  essential  duality  between  a  sample  and  the  density  (distribution)  from 
which  it  is  generated.  Clearly,  the  density  generates  the  sample;  conversely,  given  a  sample  we  can 
approximately  recreate  the  density  (as  a  histogram,  a  kernel  density  estimate,  an  empirical  c.d.f.  or 
whatever). 

Suppose  we  now  shift  the  focus  in  (1.1)  from  densities  to  samples.  In  terms  of  densities,  the 
inference  process  is  encapsulated  in  the  updating  of  the  prior  density,  p(0),  to  the  posterior  density, 
p(0\x),  through  the  medium  of  the  likelihood  function,  /(0;x).  Shifting  to  samples,  this  corresponds  to 
the  updating  of  a  sample  from  p(0)  to  a  sample  from  p(8\x)  through  the  likelihood  function  l(0;x). 

In  section  3,  we  examine  two  resampling  ideas  which  provide  techniques  whereby  samples  from 
one  distribution  may  be  modified  to  form  samples  from  another  distribution.  In  section  4,  we  illustrate 
how  these  ideas  may  be  utilized  to  modify  prior  samples  to  posterior  samples,  as  well  as  to  modify 
posterior  samples  arising  under  one  model  specification  to  posterior  samples  arising  under  another. 
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3.  Two  resampling  methods 

Suppose  that  a  sample  of  random  variates  is  easily  generated,  or  has  already  been  generated, 
from  a  continuous  density  g(0),  but  that  what  is  really  required  is  a  sample  from  a  density  h(9)  abso¬ 
lutely  continuous  with  respect  to  g(9).  Can  we  somehow  utilize  the  sample  from  g(9)  to  form  a  sam¬ 
ple  from  h(0)?  Slightly  more  generally,  given  a  positive  function  f{6)  which  is  normalizable  to  such  a 
density  h(9)  =  f(9)/j  f(9)  d 9,  can  we  form  a  sample  from  the  latter  given  only  a  sample  from  g(6)  and 
the  functional  form  of  /(0)? 


3.1  Random  variates  via  the  rejection  method 

In  the  case  where  there  exists  an  identifiable  constant  M  >  0  such  that  f(9)/g(9 )  <  M,  for  all  9, 
the  answer  is  yes,  and  the  procedure  is  as  follows  (see,  for  example,  Ripley,  1986,  p.60): 

(i)  generate  9  from  g(j9)\ 

(ii)  generate  u  from  uniform  (0, 1); 

(iii)  if  u  «  f(9)/Mg(9)  accept  9;  otherwise,  repeat  (i)-(iii). 

Any  accepted  9  is  then  a  random  variate  from  h(9)  =  f(9)/J  f(9)  69. 

Hence,  for  a  sample  0,-.  i  =  l,...,n,  from  g(9),  in  resampling  to  obtain  a  sample  from  h(9)  we 
will  tend  to  retain  those  0,  for  which  the  ratio  of  /  relative  to  g  is  large,  in  agreement  with  intuition. 
Resulting  sample  size  is  random.  Since  it  may  be  shown  that  the  probability  of  acceptance  of  a  ran¬ 
dom  9  from  g  is  Af~\  expected  sample  size  for  the  resampled  0,’s  is  A f7ln. 

IM  y  f(t>)  do  IV1),  j  /(tf  ,d6 

3.2  Random  variates  via  a  weighted  bootstrap 

In  cases  where  the  bound  M  required  in  the  above  procedure  is  not  readily  available,  we  may  still 
approximately  resample  from  h(9)  =  f(9)/j /(0)  69  as  follows.  Given  0,,  i  =  1 a  sample  from 

g,  calculate  =  f(9i)/g(9i)  and  then  =  o>,/ X  Draw  0*  from  the  discrete  distribution  over 

jm  t 

{0],...,0„}  placing  mass  q,-  on  0,-.  Then  9*  is  approximately  distributed  according  to  h  with  the 
approximation  ‘improving’  as  n  increases.  We  provide  a  justification  for  this  claim  in  a  moment. 
However,  first  note  that  this  procedure  is  a  variant  of  the  by  now  familiar  bootstrap  resampling  pro¬ 
cedure  (Efron,  1982).  The  usual  bootstrap  provides  equally  likely  resampling  of  the  0,-,  while  here  we 
have  weighted  resampling  with  weights  determined  by  the  ratio  of  /  to  g,  again  in  agreement  with 
intuition. 

Returning  to  our  claim,  suppose  for  convenience  that  0  is  univariate.  Under  the  customary 
bootstrap,  0*  has  c.d.f. 

P(0*  S  a)  =  £  ;l(-.fl](0,)  V<-.a](0)  =  f  8W  dd 

i- i 

so  that  9*  is  approximately  distributed  as  an  observation  from  g(0).  Similarly,  under  the  weighted 
bootstrap,  0*  has  c.d.f. 
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so  that  0*  is  approximately  distributed  as  an  observation  from  h.  Note  that  the  sample  size  under  such 
resampling  can  be  as  large  as  desired.  We  mention  one  important  caveat.  The  less  h  resembles  g  the 
larger  the  sample  size  n  will  need  to  be  in  order  that  the  distribution  of  9 *  well  approximates  h . 


Finally,  the  fact  that  either  resampling  method  allows  h  to  be  known  only  up  to  proportionality 
constant,  i.e.  only  through  /,  is  crucial,  since  in  our  Bayesian  applications  we  wish  to  avoid  the 
integration  required  to  standardize  /. 


4.  Bayesian  calculations  via  sampling- resampling 

Both  methods  of  the  previous  section  may  be  used  to  resample  the  posterior  ( h )  from  the  prior 
(g)  and  also  to  resample  a  second  posterior  ( h )  from  a  first  (g).  In  this  section  we  give  details  of  both 
applications. 


4.1  Prior  to  posterior 

How  does  Bayes  theorem  generate  a  posterior  sample  from  a  prior  sample?  For  fixed  x,  define 
fx(6)  =  l(9;x)p(0).  If  6  maximizes  l(9;x),  let  M  =  l(§',x).  Then  with  g(9)  =  p(B),  we  may  immedi¬ 
ately  apply  the  rejection  method  of  section  3.1  to  obtain  samples  from  the  density  corresponding  to  fx 
standardized,  which,  from  (1.1),  is  precisely  the  posterior  density  p(9  |x).  Thus,  we  see  that  Bayes 
theorem,  as  a  mechanism  for  generating  a  posterior  sample  from  a  prior  sample,  takes  the  following 
simple  form: 

for  each  6  in  the  prior  sample  accept  0  into  the  posterior  sample  with  probability 

/,(g)  =  K6  \x) 

Mp(&)  1(6,  x)  ’ 

otherwise  reject  it. 

The  likelihood  therefore  acts  as  a  resampling  probability;  those  9  in  the  prior  sample  having 
high  likelihood  are  more  likely  to  be  retained  in  the  posterior  sample.  Of  course,  since 
p(9\x)  «  l(9,x)p(9)  we  can  also  straightforwardly  resample  using  the  weighted  bootstrap  with 

<7,  =  l(9i\x)/ 1  1(9 y,x). 
j- 1 


Several  obvious  uses  of  this  sampling-resampling  perspective  are  immediate.  Using  large  prior 
samples  and  iterating  the  resampling  process  for  successive  individual  data  elements — for  two- 
dimensional  0,  say — provides  a  simple  pedagogic  tool  for  illustrating  the  sequential  Bayesian  learning 
process,  as  well  as  the  increasing  concentration  of  the  posterior  as  the  amount  of  data  increases.  In 
addition,  the  approach  provides  natural  links  with  elementary  graphical  displays;  e.g.  histograms,  stem 
and  leaf  displays,  boxplots  to  summarize  univariate  marginal  posterior  distributions,  scatterplots  to 
summarize  bivariate  posteriors,  etc.  In  general,  the  translation  from  functions  to  samples  provides  a 
wealth  of  opportunities  for  creative  exploration  of  Bayesian  ideas  and  calculations  in  the  setting  of 
computer  graphical  and  EDA  tools. 
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4.2  Posterior  to  posterior 

An  important  issue  in  Bayesian  inference  is  sensitivity  of  inferences  to  model  specification.  In 
particular  we  might  ask: 

how  does  the  posterior  change  if  we  change  the  prior? 
how  does  the  posterior  change  if  we  change  the  likelihood? 

In  the  density  function  /  numerical  integration  setting,  such  sensitivity  studies  are  rather  off-putting,  in 
that  each  change  of  a  functional  input  typically  requires  one  to  carry  out  new  calculations  from 
scratch.  This  is  not  the  case  with  the  sampling-resampling  approach,  as  we  now  illustrate  in  relation 
to  the  questions  posed  above. 

In  comparing  two  models  in  relation  to  the  second  question,  we  note  that  change  in  likelihood 
may  arise  in  terms  of 

(i)  change  in  distributional  specification  with  8  retaining  the  same  interpretation,  e.g.  a  location, 

(ii)  change  in  data  to  a  larger  data  set  (prediction),  a  smaller  data  set  (diagnostics),  or  a  different 
data  set  (validation). 


To  unify  notation,  we  shall  in  either  case  denote  two  likelihoods  by  /j(0)  and  l2(0).  We  denote  two 
different  priors  to  be  compared  in  relation  to  the  first  question  by  P\(B)  and  p2(8).  For  complete  gen¬ 
erality,  we  shall  consider  changes  to  both  /  and  p,  although  in  any  particular  application  we  would  not 
typically  change  both.  Denoting  the  corresponding  posterior  densities  by  P\(d),p2{8)  we  easily  see 
that 


P2(0) 


liWPiW) 


■pm 


(4.2) 


Letting  v(0)  =  l2(8)p2(8)/ll(8)p1(8)>  we  note  that  to  implement  the  rejection  method  for  (4.2) 

requires  sup  v(8).  In  many  examples  this  will  simplify  to  an  easy  calculation.  Alternatively,  we  may 
0 

directly  apply  the  weighted  bootstrap  method  taking  g  =  P\(,9),f~  vid)pl{9)  and  to,  =  v(0,).  Resam¬ 
pled  8*  will  then  be  approximately  distributed  according  to  /  standardized,  which  is  precisely  p2(8). 

Again,  different  aspects  of  the  sensitivity  of  the  posteriors  to  changes  in  inputs  are  easily  studied 
by  graphical  examination  of  the  posterior  samples. 
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